A Brief Survey of Parametric Value Function Approximation A Brief Survey of Parametric Value Function Approximation

نویسندگان

  • Matthieu Geist
  • Olivier Pietquin
چکیده

Reinforcement learning is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the so-called value function. An important subtopic of reinforcement learning is to compute an approximation of this value function when the system is too large for an exact representation. This survey reviews state of the art methods for (parametric) value function approximation by grouping them into three main categories: bootstrapping, residuals and projected fixed-point approaches. Related algorithms are derived by considering one of the associated cost functions and a specific way to minimize it, almost always a stochastic gradient descent or a recursive least-squares approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Pareto Parametric Analysis of Two Dimensional Steady-State Heat Conduction Problems by MLPG Method

Numerical solutions obtained by the Meshless Local Petrov-Galerkin (MLPG) method are presented for two dimensional steady-state heat conduction problems. The MLPG method is a truly meshless approach, and neither the nodal connectivity nor the background mesh is required for solving the initial-boundary-value problem. The penalty method is adopted to efficiently enforce the essential boundary co...

متن کامل

APPROXIMATION OF 3D-PARAMETRIC FUNCTIONS BY BICUBIC B-SPLINE FUNCTIONS

In this paper we propose a method to approximate a parametric 3 D-function by bicubic B-spline functions

متن کامل

A uniform approximation method to solve absolute value equation

In this paper, we propose a parametric uniform approximation method to solve NP-hard absolute value equations. For this, we uniformly approximate absolute value in such a way that the nonsmooth absolute value equation can be formulated as a smooth nonlinear equation. By solving the parametric smooth nonlinear equation using Newton method, for a decreasing sequence of parameters, we can get the ...

متن کامل

An Algorithmic Survey of Parametric Value Function Approximation

Reinforcement learning is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the so-called value function. A recurrent subtopic of reinforcement learning is to compute an approximation of this value function when the system is too large f...

متن کامل

Minimizing a General Penalty Function on a Single Machine via Developing Approximation Algorithms and FPTASs

This paper addresses the Tardy/Lost penalty minimization on a single machine. According to this penalty criterion, if the tardiness of a job exceeds a predefined value, the job will be lost and penalized by a fixed value. Besides its application in real world problems, Tardy/Lost measure is a general form for popular objective functions like weighted tardiness, late work and tardiness with reje...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010